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Abstract 

In software development, version control systems (VCS) 
provide branching and merging support tools. Such tools are 
popular among developers to concurrently change a code- 
base in separate lines and reconcile their changes automati¬ 
cally afterwards. However, two changes that are correct in¬ 
dependently can introduce bugs when merged together. We 
call semantic merge conflicts this kind of bugs. 

Change impact analysis (CIA) aims at estimating the ef¬ 
fects of a change in a codebase. In this paper, we propose 
to detect semantic merge conflicts using CIA. On a merge, 
DeltaImpactFinder analyzes and compares the impact 
of a change in its origin and destination branches. We call 
the difference between these two impacts the delta-impact. 
If the delta-impact is empty, then there is no indicator of a 
semantic merge conflict and the merge can continue auto¬ 
matically. Otherwise, the delta-impact contains what are the 
sources of possible conflicts. 

1. Introduction 

Software projects are in constant evolution. Often, develop¬ 
ers perform changes concurrently in a codebase, generat¬ 
ing separate lines of development. Version control systems 
(VCS) support this activity through branches, a widely used 
feature ||6] fTSll in software development. Merge ifTOll (also 
called integration) is a fundamental operation in VCS that 
reconciles two (or more) branches. VCS can detect syn¬ 
tactical merge conflicts automatically. Nevertheless, seman¬ 
tic merge conflicts {i.e., at the level of program behavior) 
exceed the scope of these tools. Consider, for example, a 
branch renaming a template method from A»foo to A»bar 

Permission to make digital or hard copies of all or part of this work for personal or 
classroom use is granted without fee provided that copies are not made or distributed 
for profit or commercial advantage and that copies bear this notice and the full citation 
on the first page. Copyrights for components of this work owned by others than the 
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or 
republish, to post on servers or to redistribute to lists, requires prior specific permission 
and/or a fee. Request permissions from permissions@acm.org. 

IWST15, July 15 - 16, 2015, Brescia, Italy. 

Copyright is held by the owner/author(s). Publication rights licensed to ACM. 

ACM 978-1-4503-3857-8/15/07... $15.00. 
http://dx.doi.org/10.1145/2811237.2811299 


and another branch overriding foo in a B class, subclass of 
A. The two branches can be automatically merged but the 
resulting code will fail to execute as intended; indeed B»foo 
will never be executed while B»bar is supposed to exist but 
does not. Such semantic merge conflicts are not detected by 
current VCS. 

Change Impact Analysis EEl (CIA) is an active research 
field that aims at identifying the potential consequences of a 
change in a codebase. Typically, CIA techniques establish 
dependency relationships between the code entities of the 
codebase. These relationships are used afterwards for de¬ 
tecting the set of entities that are impacted by a change. The 
rationale is that when a code entity changes, the behavior 
of the dependent entities is impacted. Many CIA research 
works use the technique of computing the dependencies of 
a change in the original codebase where the authors created 
the change ifTl l5l 171 [121 . 

This paper proposes a solution to help integrators in 
the detection of semantic merge conflicts using CIA. On 
a merge, DeltaImpactFinder analyzes and compares the 
impact of a change in its origin and destination branches. 
We call the difference between these two impacts the delta- 
impact. If the delta-impact is empty, it means that there is no 
semantic merge conflict and the merge can continue auto¬ 
matically. Otherwise, the delta-impact contains what are the 
sources of possible conflicts. The contributions of this paper 
are the following; 

• a description of semantic merge conflict with an example 
(Section]^; 

• a CIA technique, named DeltaImpactFinder, to de¬ 
tect semantic merge conflicts (Section]^; 

• a discussion of usages of this technique (Section|^; 

• a prototype of this technique implemented in Pharo (Sec¬ 
tion]^; 


2. Problem s when Merging: Semantic 
Conflicts 

To show how semantic conflicts appear when merging, we 
start by introducing an example of the Fragile Base Class 
Problem CU (FBCP). Consider the following logging li¬ 
brary that implements a Log class whose API has the meth¬ 
ods log:, which can record a single message into an internal 
collection, and logAll:, which records multiple messages in 
one shot using log:. The logic for adding an element to the 
collection of logs is only expressed inside the log: method. 
Following there is the code illustrating the most relevant 
points of such implementation: 

Object subclass: ^Log 

instanceVariableNames: 'messages’. 


such kind of bugs, integrators need to understand the code 
in a change (A) more deeply e.g., know the intention of the 
change, in which version was it developed. Then, the activity 
of integration requires a big human effort which new tools 
can help to alleviate. 

In a more general way, we would like a tool that helps the 
integrator by answering the following questions: 

Ql. Does a A produce semantic conflicts if merged in a 
particular version of the codebase? 

Q2. What code entities are involved in a semantic conflict 
that A produces ? 

Q3. When was the change that produced a semantic conflict 
with a A integrated? 


Log 3> log: a Message 
messages add: aMessage. 

Log 2> logAll: someMessages 

someMessages do: [:each | self log: each ]. 

We want now to introduce a change in this library. At 
some point in time, a developer starts a new branch of the 
library from version A and implements a new feature: the 
FilteredLog (Figure[^. Filtered Log is a subclass of Log that 
overrides log: to record the message only when it satishes 
a filter. Note that FilteredLog does not need to override 
Log^logAII:. We refer to this change as lS.p. The code 
illustrating A^ is the following: 

+ Log subclass: :)5^FilteredLog 
+ instanceVariableNames: 'filterBlock'. 

+ 

+ FilteredLog ^ log: aMessage 
+ (filterBlock value: aMessage) 

+ ifTrue: [ super log: aMessage ]. 

In parallel, the main branch of the library evolves: the 
method Log^logAII: no longer uses self log: to record each 
received message, but instead each message is added directly 
to the internal collection. 

Log 2> logAll: someMessages 

someMessages do: [:each | self log: each ]. 

+ messages addAII: someMessages 

When the integrator wants to merge A^ in the main 
branch, the tool does not inform any merging conflicts but 
the introduced feature does not work as expected. Indeed, 
FilteredLog does not hlter any messages when using logAll: 
because this method does not use the log: message anymore. 

log := FilteredLog new. 

log filterBlock: [ :each | each > 0 ]. 

log logAll: #(-5). 

log messages isEmpty. "false —> wrong!" 

We can observe from this example that an integrator can 
merge A p introducing a semantic conflict silently. To detect 
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Figure 1. A developer starts a new branch of the library 
from version A of a logging library and implements a new 
feature: the FilteredLog. We name such change as /S.p. Be¬ 
fore Ai? is integrated into the library, a modihcation in the 
method Log^logAII: in B makes B + Ap not working. 


In general these questions are hard to answer, specially 
in large scale projects where developers work in parallel on 
the same codebase. Automated Testing and Continuous Inte¬ 
gration practices could help in answering the hrst question. 
However, these practices depend on the coverage of the test¬ 
ing: the more tested is the code, the more likely the problem 
can be detected. Unfortunately, sometimes test coverage is 
not good enough and requires an effort that developers do 
not make. Then, we pose the following research question: 

Can CIA on origin and destination branches help to answer 
these questions ? 












































3. Our Solution: DeltaImpactFinder 


In a nutshell, we propose to detect semantic merge con¬ 
flicts using CIA. On a merge, DeltaImpactFinder ana¬ 
lyzes and compares the impact of a change in its origin and 
destination branches. We call the difference between these 
two impacts the delta-impact. If the delta-impact is empty, it 
means that there are no semantic conflicts and the merge can 
continue automatically. Otherwise, the delta-impact contains 
what are the sources of possible conflicts. 

In the following we describe our approach. First, we 
define the notions of dependency and impact. Then, based 
in such notions, we explain delta-impact. 

3.1 Dependency and Impact 

Change Impact Analysis (CIA) aims at identifying the 
potential consequences of a change in a codebase. Typically, 
CIA techniques establish dependency relationships between 
the code entities of the codebase, which they use afterwards 
for detecting the set of entities that are impacted by a change. 
The rationale is that when a code entity changes, the behav¬ 
ior of the dependent entities is impacted. In the context of 
this paper, we define dependency as follows: 

Definition 1. Dependency. A dependency is the rela¬ 
tionship between two code entities where one code en¬ 
tity (source) requires the other (target). We denote it 
source —)■ target. 

In the motivational example we introduced in Section 
the Filtered Log class depends on the Log class because 
of the inheritance relationship between them. In this paper 
we focus on static dependency analysis, i.e., dependencies 
that are explicit in the source code, however we believe that 
DeltaImpactFinder can be generalized to other kinds of 
dependencies. We describe the dependencies of DeltaIm¬ 
pactFinder in more detail in Section lSH 
Then, we define the impact of a A as follows: 

Definition 2. Impact. The impact of a change A in a 
codebase C, denoted I{A,C), is the set of dependencies 
introduced or removed in C after applying A. 

In the motivational example, the impact of A in its ori¬ 
gin branch includes the following dependency modifications 
(Figure 1^: 

11 Introduction of an inheritance dependency from 
FilteredLog to Log. 

12 Introduction of a message send dependency 
from Log»logAII: to FilteredLog»log:. 

13 Introduction of a message send dependency 
from FilteredLog»log: to Log»log:. 

3.2 DeltaImpactFinder 

In the example (Figure [^, the comparison of the impact of 
Af’ in origin and destination branches shows that the depen¬ 
dency from Log»logAII: to FilteredLog»log: is missing in 
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Figure 2. Impact of Ap in A. In the origin branch, A p in¬ 
troduces three dependencies: one corresponding to an inher¬ 
itance relationship, and the others corresponding to message 
sends. 


the destination branch. Precisely, the cause of the semantic 
conflict in the example is the change in Log»logAll:, which 
no longer invokes FilteredLog»log:. 

We observe that the impact of a A contains a set of 
relationships between code entities in a particular version of 
code. Informally, we can think about this impact as a set of 
constraints that have to be satisfied for the code to work as 
expected. Then, we pose the following hypothesis: 

A semantic conflict appears when the set of dependencies 
that a A introduces (or removes) into its origin branch is 
different from those introduced (or removed) when 
merging such A in the destination branch. 


Dependency 
Introduced 
In A and B 

Dependency 
missing In B 


Figure 3. Delta-impact of A^ from A to B. Comparison 
of the impact of Ay? in both A and B. The dependency from 
Log»logAll: to FilteredLog»log: is missing in B. 
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In other words, if there are missing or extra dependencies 
in the destination branch, this could mean that A has a 
different meaning than the one intended by its author. On 
the contrary, if the dependencies are the same the change 
may have the same effects. Then, we define delta-impact as 
follows: 

Definition 3. Delta-Impact. The delta-impact of a change 
A with origin branch A and destination branch B, denoted 






















DI{A, A, B), is the symmetric dijferenc^between the set 
of impacts of A in A and the impact of A in B. 

To compute the delta-impact of A, we obtain the impact 
of A in the origin and destination branches, and then we 
compute the symmetric difference between the two impacts. 
In the example (Figure |^, the impact I{Af,A) yields the 
set {ii, 12, *3}, while I{Ap, B) yields the set {11,^3}. Then, 
DI{Af, a, B) results in the set {12}, because 12 is missing 
in the destination branch B. In this context, a tool that 
computes and shows the delta-impact of Ap to the integrator 
would answer the questions Q 1 and Q 2 that we defined in 
Section 12 ] 
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Figure 4. DeltaImpactFinder I{Ap,A) is the origi¬ 
nal impact of Ap, while I{Ap, B) is the destination impact. 
DI{Ap, A, B) is the delta-impact of Ai? in the destination 
branch, which shows that a dependency introduction is miss¬ 
ing ih)- 

4. Applicability 

In this section we describe concrete scenarios where we 
would like to validate DeltaImpactFinder (in future 
works). Since our plan is to validate our approach with Pharo 
community, we present these scenarios in terms of Pharo. 

4.1 Aged Code Changes 

The codebase of a large open-source project like Pharo 
evolves through code changes that developers submit. When 
a developer submits a code change, this change must pass a 
reviewing process before an integrator merges it into the 
main development branch. Since this reviewing process 
takes some time and code changes are integrated every day, 
when the integrator has to merge an approved code change, 
the current Pharo codebase may be different than the code- 
base where the author of the change worked. 

In this situation, DeltaImpactFinder can help Pharo 
integrators to discover semantic conflicts on the merge. 
Given a code change A submitted by a developer, DeltaIm¬ 
pactFinder can answer; 

* the symmetric difference between two sets includes only the elements that 
belong to one of such sets but not to both. 


“Is the impact of A in the current Pharo the same as the 
impact in the Pharo where A was originally created?” 

4.2 Software Migration 

Often, when the codebase of a project changes, other projects 
that depend on it need to be migrated. This change propaga¬ 
tion is known as ripple effect || 20 l. Ripple effects are prob¬ 
lematic because a small change in a project can have a very 
large impact on other projects. Additionally, sometimes a 
code that needs migration remains undiscovered for a long 
time due to low test coverage. 

We can illustrate this scenario by rephrasing the FBCP 
example used in Section]^ Let’s suppose that a developer 
works in a project in Pharo version A. The system provides 
the class Log, which the developer extends in his project by 
creating the subclass Filtered Log. One year later, a new sta¬ 
ble version of Pharo is available: version B. Among plenty 
of changes in Pharo B, the method Log»logAII: has been 
modified like in the FBCP example. As before, a bug ap¬ 
pears in the method logAll: when invoked in an instance 
of Filtered Log. Note that when the developer loads the Fil¬ 
tered Log class in Pharo B, the system does not raise any 
load or compilation error: it is another form of the semantic 
merge conflict. The developer will probably have to debug 
his project to And that the change in Log»logAll: is the re¬ 
sponsible. If a tool would have informed the developer that 
Log»logAll: changed, then he could save time. 

We can pose this problem in terms of our approach. When 
the Filtered Log package is loaded in Pharo A, some depen¬ 
dencies are introduced between code entities of Filtered Log 
and Pharo. This is what we have defined as impact. When the 
package is loaded in Pharo B, the impact is different: 12 is 
missing (Figure^. In general terms, DeltaImpactFinder 
can help developers to answer the following question: 

“Which Pharo code entities with impact on my project 
changed from Pharo A to Bl” 

4.3 Requirements 

The main requirement of any CIA technique is a high pre¬ 
cision and a high recall || 9 l. A high precision means that a 
technique finds substantially more relevant results than irrel¬ 
evant, while high recall means that a technique finds most of 
the relevant results. 

However, we extract some additional requirements for the 
implementation of DeltaImpactFinder from the scenar¬ 
ios presented above in this section; 

Isolation from tool’s environment. For supporting “Soft¬ 
ware Migration”, the implementation needs to compute 
dependencies of code entities as if they were loaded in 
some arbitrary Pharo version, independently of the Pharo 
version where the tool is actually running. 

Usable in real use cases. Since we aim at building a tool 
that real developers can evaluate, the implementation 
should compute the dependencies in a reasonable time. 




5. Implementation 

We start this section doing an overview of the main char¬ 
acteristics of our prototype implementation of DeltaIm- 
PACtFinder. Some design decisions are consequence of 
the requirements stated in Section [43] 

Static code analysis. We use default Pharo support for per¬ 
forming static code analysis. For example, the AST-Core 
package provides support for visiting the abstract syntax 
tree of methods and collecting dependencies. 

Light-weight and polymorphic code metamodel. We im¬ 
plemented RingFicus, a metamodel for Pharo code en¬ 
tities. It provides first-class representations for class, 
method, instance variable, etc. RingFicus allows to model 
code either internal or external to the current Pharo envi¬ 
ronment to browse them, query them, analyze them, as if 
they were loaded in the system. 

5.1 Computing Dependencies 

DeltaImpactFinder requires to compute dependencies 
between source code entities. In our solution such entities 
are classes, metaclasses, instance and class variables, traits, 
class-traits and methods. The relationships that we consider 
as dependencies in DeltaImpactFinder are the follow¬ 
ing; 

Inheritance: A dependency from a class to its superclass. 

Trait Usage: A dependency from a class, metaclass, trait, or 
class-trait to all traits in its trait composition. 

Variable Access: A dependency from a method in a class 
or metaclass that accesses (read or write) an instance or 
class variable, to the accessed variable. 

Message Send: A dependency from a method including a 
message-send sentence, to all the possible methods that 
are invoked. Due to the absence of type information in 
the language, in the general case the algorithm uses the 
selector of a message-send to look up for all the imple¬ 
mentors in the codebase. However, in the case of self- 
sends and super-sends the algorithm refines its look up 
for obtaining more precise dependencies. 

For testing our prototype, we implemented Depen den- 
cyMiner, which iterates over all the source code entities of 
a Pharo environment collecting the dependencies. Each de¬ 
pendency is an association source —>■ target. 

5.2 Computing Impact and Delta-Impact 

For the computation of the impact of A in an environment 
(Figure the algorithm starts by building the codebase 
C + A. Then, the prototype computes the dependency sets 
of each environment using the DependencyMiner, described 
above. Finally, the prototype the impact and the delta-impact 
by performing Collection operations. 


AO 



Figure 5. Computation of the impact of A in C. The 

algorithm starts by building the codebase C + A. After, the 
algorithm computes the dependency sets of each codebase 
{D{C) and D{C -P A)). Finally, the algorithm computes the 
impact by finding the symmetric difference between D{E) 
and D{E + A). 


6. Related Work 

Change Impact Analysis. In an exhaustive survey |I3 
about CIA, Li et al. analyze 30 publications from 1997 
to 2010 and identified 23 code-based CIA techniques. The 
study characterizes the CIA techniques, and identifies key 
applications of CIA techniques in software maintenance. 
Typically, CIA techniques establish dependency relation¬ 
ships between the code entities of the codebase, which they 
use afterwards for detecting the set of entities that are im¬ 
pacted by a change. The rationale is that when a code en¬ 
tity changes, the behavior of the dependent entities is im¬ 
pacted. The techniques to identify dependencies in a code- 
base are typically classified into static and dynamic. The 
static techniques identify the dependencies using static code 
analysis lfT4l [191 . while dynamic techniques |[2] [8] collect 
data from program execution. There are, as well, mixed 
techniques ID which combine both techniques. DeltaIm¬ 
pactFinder is orthogonal to the technique to identify de¬ 
pendencies, besides our prototype works with a static CIA 
technique. 

Merging. The semantic merge conflicts have been stud¬ 
ied before under different names. Mens in describes this 
problem in his survey of code merging. In this work, the au¬ 
thor remarks that most approaches to software merging have 
been validated on imperative programming languages and it 
is not trivial to port these approaches to the object-oriented 
paradigm, due to late binding and polymorphism in object- 
oriented programming languages. 



















Ring Metamodel. Ring llT7llfT6l is a source code meta¬ 
model that serves as a unified infrastructure for building 
tools in Pharo. While Ring has proven efficacy for depen¬ 
dency analysis tools nsmsi, we found some limitations that 
driven us to reimplement our own RingFicus package. In our 
early tests. Ring did not fulfill the requirements we described 
in Section]^ Ring code entities did not ensure isolation from 
tool’s environment, and they were not efficient to represent 
a whole Pharo environment. 

7. Conclusion and Future Perspectives 

In this paper, we proposed a solution to help integrators in 
the detection of semantic merge conflicts using CIA. On a 
merge, DeltaImpactFinder analyzes and compares the 
impact of a change in its origin and destination branches. 
We call the difference between these two impacts the delta- 
impact. If the delta-impact is empty, it means that there 
is no semantic merge conflict and the merge can continue 
automatically. Otherwise, the delta-impact contains what are 
the sources of possible conflicts. 

In short, this paper makes the following contributions: 

• a description of semantic merge conflict with an example; 

• a CIA technique, named DeltaImpactFinder, to de¬ 
tect semantic merge conflicts; 

• a discussion of concrete scenarios to validate this tech¬ 
nique in future work; 

• a prototype of this technique implemented in Pharo. 
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